Search CORE

64 research outputs found

Compressed Bit-sliced Signature Files An Index Structure for Large Lexicons

Author: Can Fazli
Carterette Ben
Publication venue
Publication date: 01/04/1999
Field of study

We use the signature file method to search for partially specified terms in large lexicons. To optimize efficiency, we use the concepts of the partially evaluated bit-sliced signature file method and memory resident data structures. Our system employs signature partitioning, compression, and term blocking. We derive equations to obtain system design parameters, and measure indexing efficiency in terms of time and space. The resulting approach provides good response time and is storage-efficient. In the experiments we use four different lexicons, and show that the signature file approach outperforms the inverted file approach in certain efficiency aspects. KEYWORDS: Lexicon search, n-grams, signature files

Scholarly Commons @ MiamiOH (Miami University)

Incremental Test Collections

Author: Allan James
Carterette Ben
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2005
Field of study

Corpora and topics are readily available for information retrieval research. Relevance judgments, which are necessary for system evaluation, are expensive; the cost of obtaining them prohibits in-house evaluation of retrieval systems on new corpora or new topics. We present an algorithm for cheaply constructing sets of relevance judgments. Our method intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of condence in the result of the evaluation. We demonstrate the algorithm\u27s eectiveness by showing that it produces small sets of relevance judgments that reliably discriminate between two systems. The algorithm can be used to incrementally design retrieval systems by simultaneously comparing sets of systems. The number of additional judgments needed after each incremental design change decreases at a rate reciprocal to the number of systems being compared. To demonstrate the eectiveness of our method, we evaluate TREC ad hoc submissions, showing that with 95% fewer relevance judgments we can reach a Kendall\u27s tau rank correlation of at least 0.9

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

On rank correlation and the distance between rankings

Author: Ben Carterette
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Rank correlation statistics are useful for determining whether a there is a correspondence between two measurements, par-ticularly when the measures themselves are of less interest than their relative ordering. Kendall’s τ in particular has found use in Information Retrieval as a “meta-evaluation” measure: it has been used to compare evaluation measures, evaluate system rankings, and evaluate predicted perfor-mance. In the meta-evaluation domain, however, correla-tions between systems confound relationships between mea-surements, practically guaranteeing a positive and signifi-cant estimate of τ regardless of any actual correlation be-tween the measurements. We introduce an alternative mea-sure of distance between rankings that corrects this by ex-plicitly accounting for correlations between systems over a sample of topics, and moreover has a probabilistic interpre-tation for use in a test of statistical significance. We validate our measure with theory, simulated data, and experiment

CiteSeerX

Crossref

Query Resolution for Conversational Search with Limited Supervision

Author: Bajaj Payal
Belkin Nicholas J
Carterette Ben
Dalton Jeffrey
Devlin Jacob
Nguyen Tri
Vaswani Ashish
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

In this work we focus on multi-turn passage retrieval as a crucial component of conversational search. One of the key challenges in multi-turn passage retrieval comes from the fact that the current turn query is often underspecified due to zero anaphora, topic change, or topic return. Context from the conversational history can be used to arrive at a better expression of the current turn query, defined as the task of query resolution. In this paper, we model the query resolution task as a binary term classification problem: for each term appearing in the previous turns of the conversation decide whether to add it to the current turn query or not. We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We propose a distant supervision method to automatically generate training data by using query-passage relevance labels. Such labels are often readily available in a collection either as human annotations or inferred from user interactions. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval architecture and demonstrate its effectiveness on the TREC CAsT dataset.Comment: SIGIR 2020 full conference pape

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education

Author: Bauer Christine
Carterette Ben
Faggioli Guglielmo
Ferro Nicola
Fuhr Norbert
Publication venue
Publication date: 01/01/2023
Field of study

This report documents the program and the outcomes of Dagstuhl Seminar 23031 ``Frontiers of Information Access Experimentation for Research and Education'', which brought together 37 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education. The seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented by next generation information access systems, from both the research and the experimentation point of views, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area in order to improve not also our research methods and findings but also the education of the new generation of researchers and developers. The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.Comment: Dagstuhl Seminar 23031, report

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Overview of the TREC 2011 Session Track

Author: Carterette Ben
Clough Paul
Hall Mark
Kanoulas Evangelos
Sanderson Mark
Publication venue
Publication date: 01/01/2011
Field of study

Open Research Online (The Open University)

Edge Hill University Research Information Repository

Recommending Podcasts for Cold-Start Users Based on Music Listening and Taste

Author: Carterette Ben
Charbuillet Christophe
Charrier Denis
Laurent Martin
Nazari Zahra
Pages Johan
Vecchione Briana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/07/2020
Field of study

Recommender systems are increasingly used to predict and serve content that aligns with user taste, yet the task of matching new users with relevant content remains a challenge. We consider podcasting to be an emerging medium with rapid growth in adoption, and discuss challenges that arise when applying traditional recommendation approaches to address the cold-start problem. Using music consumption behavior, we examine two main techniques in inferring Spotify users preferences over more than 200k podcasts. Our results show significant improvements in consumption of up to 50\% for both offline and online experiments. We provide extensive analysis on model performance and examine the degree to which music data as an input source introduces bias in recommendations.Comment: SIGIR 202

arXiv.org e-Print Archive

Crossref